Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate failure on truffleruby-head + macOS + XCode 14.2 #75

Closed
wants to merge 7 commits into from

Conversation

eregon
Copy link
Member

@eregon eregon commented Jan 13, 2024

From #73 (comment)

Run bundle exec rake compile test
mkdir -p lib
mkdir -p tmp/x86_64-darwin20/zlib/3.2.2
/Users/runner/.rubies/truffleruby-head/bin/ruby -I. ../../../../ext/zlib/extconf.rb
cd tmp/x86_64-darwin20/zlib/3.2.2
checking for deflateReset(NULL) in -lz... yes
checking for crc32_combine() in zlib.h... yes
checking for adler32_combine() in zlib.h... yes
checking for z_crc_t in zlib.h... yes
checking for z_size_t in zlib.h... yes
checking for crc32_z() in zlib.h... yes
checking for adler32_z() in zlib.h... yes
creating Makefile
cd -
cd tmp/x86_64-darwin20/zlib/3.2.2
/usr/bin/make
compiling ../../../../ext/zlib/zlib.c
linking shared-object zlib.bundle
ld: warning: -undefined dynamic_lookup may not work with chained fixups
cd -
/usr/bin/make install sitearchdir=../../../../lib sitelibdir=../../../../lib target_prefix=
mkdir -p tmp/x86_64-darwin20/stage/lib
/usr/bin/install -c -m 0755 zlib.bundle ../../../../lib
cp tmp/x86_64-darwin20/zlib/3.2.2/zlib.bundle tmp/x86_64-darwin20/stage/lib/zlib.bundle
<internal:core> core/kernel.rb:234:in `gem_original_require': dlopen(/Users/runner/work/zlib/zlib/lib/zlib.bundle, 0x0009): symbol not found in flat namespace (_rb_econv_check_error) (RuntimeError)
	from <internal:/Users/runner/.rubies/truffleruby-head/lib/mri/rubygems/core_ext/kernel_require.rb>:37:in `require'
	from /Users/runner/work/zlib/zlib/test/zlib/test_zlib.rb:10:in `<top (required)>'
	from <internal:core> core/kernel.rb:234:in `gem_original_require'
	from <internal:/Users/runner/.rubies/truffleruby-head/lib/mri/rubygems/core_ext/kernel_require.rb>:37:in `require'
	from /Users/runner/work/zlib/zlib/vendor/bundle/truffleruby/3.2.2.9/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:21:in `block in <main>'
	from /Users/runner/work/zlib/zlib/vendor/bundle/truffleruby/3.2.2.9/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:6:in `select'
	from /Users/runner/work/zlib/zlib/vendor/bundle/truffleruby/3.2.2.9/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:6:in `<main>'
rake aborted!
Command failed with status (1)
/Users/runner/work/zlib/zlib/rakefile:10:in `block in <top (required)>'
/Users/runner/work/zlib/zlib/vendor/bundle/truffleruby/3.2.2.9/gems/rake-13.1.0/exe/rake:27:in `<top (required)>'
<internal:core> core/kernel.rb:383:in `load'
<internal:core> core/kernel.rb:383:in `load'
<internal:core> core/kernel.rb:383:in `load'
/Users/runner/.rubies/truffleruby-head/bin/bundle:44:in `<main>'
Tasks: TOP => test_internal
(See full trace by running task with --trace)

@eregon
Copy link
Member Author

eregon commented Jan 13, 2024

Mmh moving it to another function does not seem to help: https://github.com/ruby/zlib/actions/runs/7512241958/job/20452716267?pr=75
Although maybe the issue is clang inlines it and then it has no effect.

Looking at dlopen man pages for RTLD_LAZY there are some differences which could be what I guessed above, but not sure:
macOS: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/dlopen.3.html

     RTLD_LAZY   Each external function reference is bound the first time the
                 function is called.

Though that does sound like it should only resolve it when the external function is called, not when the caller function is called, so probably my guess is wrong.

Linux (man dlopen):

       RTLD_LAZY
              Perform  lazy binding.  Resolve symbols only as the code that references them is executed.  If the symbol is
              never referenced, then it is never resolved.  (Lazy binding is performed only for function references;  ref‐
              erences  to  variables  are  always immediately bound when the shared object is loaded.)  Since glibc 2.1.1,
              this flag is overridden by the effect of the LD_BIND_NOW environment variable.

My next guess is tests do something different on macOS and do trigger that dummy encoding path (which uses rb_econv_check_error) while they don't on Linux.

@eregon
Copy link
Member Author

eregon commented Jan 13, 2024

https://github.com/ruby/zlib/actions/runs/7512241958/job/20452716267?pr=75#step:4:32
So it fails on the require 'zlib' line.
That's weird, it's as if it was using RTLD_NOW instead of RTLD_LAZY.
I'll keep investigating next week.

@eregon
Copy link
Member Author

eregon commented Jan 13, 2024

dlopen(/Users/runner/work/zlib/zlib/lib/zlib.bundle, 0x0009): symbol not found in flat namespace (_rb_econv_check_error)
So I should check what the 0x0009 means in terms of flags.

@eregon eregon force-pushed the fix-truffleruby-macos-failure branch from 6d1243c to f66f192 Compare January 13, 2024 12:08
@eregon
Copy link
Member Author

eregon commented Jan 13, 2024

If https://opensource.apple.com/source/dyld/dyld-239.3/include/dlfcn.h.auto.html is correct (but is it?)
Then 9 is 8 (RTLD_GLOBAL) + 1 (RTLD_LAZY), which are the expected flags we set in TruffleRuby.
But then it behaves as non-lazy, so weird.
Maybe the Init_zlib is already eagerly loading these symbols because it uses rb_gzreader_getc and that would inline everything? It doesn't seem likely though.
Weird indeed.

@eregon eregon force-pushed the fix-truffleruby-macos-failure branch from f66f192 to 18ab33d Compare January 13, 2024 12:14
@eregon eregon force-pushed the fix-truffleruby-macos-failure branch from 6baf188 to b418041 Compare January 14, 2024 20:41
@eregon
Copy link
Member Author

eregon commented Jan 14, 2024

ld: warning: -undefined dynamic_lookup may not work with chained fixups
Maybe that's the issue and causes to not resolve symbols lazily?

@eregon
Copy link
Member Author

eregon commented Jan 14, 2024

ld: warning: -undefined dynamic_lookup may not work with chained fixups
Maybe that's the issue and causes to not resolve symbols lazily?

Yeah that seems to be it.
It works fine on macOS 11/XCode 13.2: https://github.com/ruby/zlib/actions/runs/7521661799/job/20472764087?pr=75
And it fails on macOS 12/XCode 14.2: https://github.com/ruby/zlib/actions/runs/7521608329/job/20472641124?pr=75
It works fine on macOS 13/XCode 15.1: https://github.com/ruby/zlib/actions/runs/7532280338/job/20502568502?pr=75

So it seems the same issue that CRuby had for XCode 14 in:

Also it might be fixed in XCode 14.3: python/cpython#97524 (comment)
But the macos-latest/macos-12 image uses 14.2 :/

eregon added a commit to eregon/zlib that referenced this pull request Jan 15, 2024
* macos-latest, which currently resolves to macos-12 ships uses XCode 14.2
  which has a known bug for -undefined dynamic_lookup which causes
  truffleruby to fail:
  ruby#75 (comment)
@eregon eregon mentioned this pull request Jan 15, 2024
@eregon
Copy link
Member Author

eregon commented Jan 15, 2024

For now let's use macos 13 so we don't have the problematic XCode 14.2: #76

@eregon eregon changed the title Try to fix failure on truffleruby-head + macOS Investigate failure on truffleruby-head + macOS + XCode 14.2 Jan 15, 2024
@eregon eregon closed this Jan 15, 2024
@eregon eregon reopened this Jan 16, 2024
@eregon eregon closed this Jan 16, 2024
@eregon
Copy link
Member Author

eregon commented Jan 16, 2024

macos-12 (same as macos-latest) + MACOSX_DEPLOYMENT_TARGET=11.0 works as well: https://github.com/eregon/zlib/actions/runs/7543545007/job/20534822113
That seems a better fix in general than having to test on macos != 12.
We should only do it for XCode 14.2 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant